AITopics

1907.0033

Country: Asia (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.99)
(2 more...)

Taele, Paul, Hammond, Tracy

Hashigo: A Next Generation Sketch Interactive System for Japanese Kanji

arXiv.org Artificial IntelligenceApr-22-2025

Language students can increase their effectiveness in learning written Japanese by mastering the visual structure and written technique of Japanese kanji. Yet, existing kanji handwriting recognition systems do not assess the written technique sufficiently enough to discourage students from developing bad learning habits. In this paper, we describe our work on Hashigo, a kanji sketch interactive system which achieves human instructor - level critique and feedback on both the visual structure and written technique of students' sketched kanji. This type of automated critique and feedback allows students to target and correct specific deficiencies in their sketches that, if left untreated, are detrimental to effective long - term kanji learning.

artificial intelligence, handwriting recognition, student, (15 more...)

2504.1394

Country:

Asia > Japan > Honshū (0.28)
North America > United States > Texas > Brazos County > College Station (0.14)

Genre: Research Report (0.40)

Industry:

Education > Curriculum > Subject-Specific Education (0.94)
Education > Educational Setting (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision > Sketch Understanding (1.00)
Information Technology > Artificial Intelligence > Vision > Handwriting Recognition (1.00)

arXiv.org Artificial IntelligenceNov-22-2023

ViStruct: Visual Structural Knowledge Extraction via Curriculum Guided Code-Vision Representation

Chen, Yangyi, Wang, Xingyao, Li, Manling, Hoiem, Derek, Ji, Heng

State-of-the-art vision-language models (VLMs) still have limited performance in structural knowledge extraction, such as relations between objects. In this work, we present ViStruct, a training framework to learn VLMs for effective visual structural knowledge extraction. Two novel designs are incorporated. First, we propose to leverage the inherent structure of programming language to depict visual structural information. This approach enables explicit and consistent representation of visual structural information of multiple granularities, such as concepts, relations, and events, in a well-organized structured format. Second, we introduce curriculum-based learning for VLMs to progressively comprehend visual structures, from fundamental visual concepts to intricate event structures. Our intuition is that lower-level knowledge may contribute to complex visual structure understanding. Furthermore, we compile and release a collection of datasets tailored for visual structural knowledge extraction. We adopt a weakly-supervised approach to directly generate visual event structures from captions for ViStruct training, capitalizing on abundant image-caption pairs from the web. In experiments, we evaluate ViStruct on visual structure prediction tasks, demonstrating its effectiveness in improving the understanding of visual structures. The code is public at \url{https://github.com/Yangyi-Chen/vi-struct}.

computational linguistic, knowledge, vistruct, (13 more...)

2311.13258

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Florida > Orange County > Orlando (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
(22 more...)

Genre:

Research Report (1.00)
Overview (0.68)

Industry: Education (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)

arXiv.org Artificial IntelligenceNov-4-2023

Stable Diffusion Reference Only: Image Prompt and Blueprint Jointly Guided Multi-Condition Diffusion Model for Secondary Painting

Ai, Hao, Sheng, Lu

Stable Diffusion and ControlNet have achieved excellent results in the field of image generation and synthesis. However, due to the granularity and method of its control, the efficiency improvement is limited for professional artistic creations such as comics and animation production whose main work is secondary painting. In the current workflow, fixing characters and image styles often need lengthy text prompts, and even requires further training through TextualInversion, DreamBooth or other methods, which is very complicated and expensive for painters. Therefore, we present a new method in this paper, Stable Diffusion Reference Only, a images-to-image self-supervised model that uses only two types of conditional images for precise control generation to accelerate secondary painting. The first type of conditional image serves as an image prompt, supplying the necessary conceptual and color information for generation. The second type is blueprint image, which controls the visual structure of the generated image. It is natively embedded into the original UNet, eliminating the need for ControlNet. We released all the code for the module and pipeline, and trained a controllable character line art coloring model at https://github.com/aihao2000/stable-diffusion-reference-only, that achieved state-of-the-art results in this field. This verifies the effectiveness of the structure and greatly improves the production efficiency of animations, comics, and fanworks.

controlnet, reference only, stable diffusion reference only, (12 more...)

2311.02343

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsApr-6-2023, 17:09:00 GMT

A Productive, Systematic Framework for the Representation of Visual Structure

We describe a unified framework for the understanding of struc(cid:173) ture representation in primate vision. A model derived from this framework is shown to be effectively systematic in that it has the ability to interpret and associate together objects that are related through a rearrangement of common "middle-scale" parts, repre(cid:173) sented as image fragments. The model addresses the same concerns as previous work on compositional representation through the use of what where receptive fields and attentional gain modulation. It does not require prior exposure to the individual parts, and avoids the need for abstract symbolic binding. The focus of theoretical discussion in visual object processing has recently started to shift from problems of recognition and categorization to the representation of object structure.

cid, representation, visual field, (16 more...)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.31)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.32)

Neural Information Processing SystemsApr-6-2023, 16:38:16 GMT

Probabilistic principles in unsupervised learning of visual structure: human data and a model

To find out how the representations of structured visual objects depend on the co-occurrence statistics of their constituents, we exposed subjects to a set of composite images with tight control exerted over (1) the condi- tional probabilities of the constituent fragments, and (2) the value of Bar- low's criterion of "suspicious coincidence" (the ratio of joint probability to the product of marginals). We then compared the part verification re- sponse times for various probe/target combinations before and after the exposure. For composite probes, the speedup was much larger for tar- gets that contained pairs of fragments perfectly predictive of each other, compared to those that did not. This effect was modulated by the sig- nificance of their co-occurrence as estimated by Barlow's criterion. For lone-fragment probes, the speedup in all conditions was generally lower than for composites.

probabilistic principle, unsupervised learning, visual structure, (4 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.40)

arXiv.org Artificial IntelligenceApr-13-2019

LiveSketch: Query Perturbations for Guided Sketch-based Visual Search

Collomosse, John, Bui, Tu, Jin, Hailin

LiveSketch is a novel algorithm for searching large image collections using hand-sketched queries. LiveSketch tackles the inherent ambiguity of sketch search by creating visual suggestions that augment the query as it is drawn, making query specification an iterative rather than one-shot process that helps disambiguate users' search intent. Our technical contributions are: a triplet convnet architecture that incorporates an RNN based variational autoencoder to search for images using vector (stroke-based) queries; real-time clustering to identify likely search intents (and so, targets within the search embedding); and the use of backpropagation from those targets to perturb the input stroke sequence, so suggesting alterations to the query in order to guide the search. We show improvements in accuracy and time-to-task over contemporary baselines using a 67M image corpus.

artificial intelligence, machine learning, sketch, (20 more...)

1904.06611

Genre: Research Report (0.82)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
Information Technology > Sensing and Signal Processing > Image Processing (0.94)
(2 more...)

AAAI ConferencesJul-14-2009

Hashigo: A Next-Generation Sketch Interactive System for Japanese Kanji

Taele, Paul (Texas A&M University) | Hammond, Tracy (Texas A&M University)

Language students can increase their effectiveness in learning written Japanese by mastering the visual structure and written technique of Japanese kanji. Yet, existing kanji handwriting recognition systems do not assess the written technique sufficiently enough to discourage students from developing bad learning habits. In this paper, we describe our work on Hashigo, a kanji sketch interactive system which achieves human instructor-level critique and feedback on both the visual structure and written technique of students’ sketched kanji. This type of automated critique and feedback allows students to target and correct specific deficiencies in their sketches that, if left untreated, are detrimental to effective long-term kanji learning.

artificial intelligence, sketch understanding, student, (16 more...)

AAAI Conferences

Twenty-First IAAI Conference

Country:

North America > United States > Texas > Brazos County > College Station (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
(7 more...)

Industry:

Education > Curriculum > Subject-Specific Education (1.00)
Education > Educational Setting (0.94)

Technology: Information Technology > Artificial Intelligence > Vision > Sketch Understanding (1.00)

Edelman, Shimon, Hiles, Benjamin P., Yang, Hwajin, Intrator, Nathan

Probabilistic principles in unsupervised learning of visual structure: human data and a model

Neural Information Processing SystemsDec-31-2002

To find out how the representations of structured visual objects depend on the co-occurrence statistics of their constituents, we exposed subjects to a set of composite images with tight control exerted over (1) the conditional probabilities of the constituent fragments, and (2) the value of Barlow's criterion of "suspicious coincidence" (the ratio of joint probability to the product of marginals). We then compared the part verification response times for various probe/target combinations before and after the exposure. For composite probes, the speedup was much larger for targets that contained pairs of fragments perfectly predictive of each other, compared to those that did not. This effect was modulated by the significance of their co-occurrence as estimated by Barlow's criterion. For lone-fragment probes, the speedup in all conditions was generally lower than for composites. These results shed light on the brain's strategies for unsupervised acquisition of structural information in vision.

experiment, probability, representation, (13 more...)

Country:

North America > United States > Rhode Island > Providence County > Providence (0.04)
North America > United States > North Carolina > Wake County > Cary (0.04)
North America > United States > New York > Tompkins County > Ithaca (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.53)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.36)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.36)

Edelman, Shimon, Hiles, Benjamin P., Yang, Hwajin, Intrator, Nathan

Probabilistic principles in unsupervised learning of visual structure: human data and a model

Neural Information Processing SystemsDec-31-2002

To find out how the representations of structured visual objects depend on the co-occurrence statistics of their constituents, we exposed subjects to a set of composite images with tight control exerted over (1) the conditional probabilities of the constituent fragments, and (2) the value of Barlow's criterion of "suspicious coincidence" (the ratio of joint probability to the product of marginals). We then compared the part verification response times for various probe/target combinations before and after the exposure. For composite probes, the speedup was much larger for targets that contained pairs of fragments perfectly predictive of each other, compared to those that did not. This effect was modulated by the significance of their co-occurrence as estimated by Barlow's criterion. For lone-fragment probes, the speedup in all conditions was generally lower than for composites. These results shed light on the brain's strategies for unsupervised acquisition of structural information in vision.

experiment, probability, representation, (13 more...)

Country:

North America > United States > Rhode Island > Providence County > Providence (0.04)
North America > United States > North Carolina > Wake County > Cary (0.04)
North America > United States > New York > Tompkins County > Ithaca (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.53)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.36)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.36)